Automated Translation between Lexicon and Corpora
نویسندگان
چکیده
In this work we will show the role of lexical resources in machine translation processes, giving several examples after a brief overview of Machine Translation studies. Then we will advocate the need for a richer lexicon in MT processes and sketch a methodology to obtain it through a mix of corpus-based and machine learning approaches.
منابع مشابه
Cognate Mapping - A Heuristic Strategy for the Semi-Supervised Acquisition of a Spanish Lexicon from a Portuguese Seed Lexicon
We deal with the automated acquisition of a Spanish medical subword lexicon from an already existing Portuguese seed lexicon. Using two non-parallel monolingual corpora we determined Spanish lexeme candidates from Portuguese seed lexicon entries by heuristic cognate mapping. We validated the emergent lexical translation hypotheses by determining the similarity of fixed-window context vectors on...
متن کاملSemi-Supervised Acquisition of a Spanish Lexicon from a Portuguese Seed Lexicon
This paper deals with the automated acquisition of a Spanish medical subword lexicon from an already existing Portuguese seed lexicon. Using two nonparallel monolingual corpora we determine Spanish lexeme candidates from Portuguese seed lexicon entries by heuristic cognate mapping. We are still working on the experiments and trying to achieve a good method for validating the translation hypothe...
متن کاملData-driven Amharic-English Bilingual Lexicon Acquisition
This paper describes a simple approach of statistical language modelling for bilingual lexicon acquisition from Amharic-English parallel corpora. The goal is to induce a seed translation lexicon from sentence-aligned corpora. The seed translation lexicon contains matches of Amharic lexemes to weekly inflected English words. Purely statistical measures of term distribution are used as the basis ...
متن کاملTranslation Lexicon Estimates from Non-Parallel Corpora Pairs
The estimation of translation lexicon probabilities from parallel corpora is well studied in statistical machine translation. Whenever parallel corpora are not available, it is still possible to obtain unsupervised estimates from pairs of monolingual, non-parallel corpora. In both cases the standard estimator is the Expectation-Maximization (EM) that aims at increasing the likelihood of the sou...
متن کاملAdapted Seed Lexicon and Combined Bidirectional Similarity Measures for Translation Equivalent Extraction from Comparable Corpora
An improved method for extracting translation equivalents from bilingual comparable corpora according to contextual similarity was developed. This method has two main features. First, a seed bilingual lexiconwhich is used to bridge contexts in different languagesis adapted to the corpora from which translation equivalents are to be extracted. Second, the contextual similarity is evaluated by ...
متن کامل